Geospatial analysis is an approach to applying statistical analysis and other analytic techniques to data which has a geographical or spatial aspect. Such analysis would typically employ software capable of rendering maps processing spatial data, and applying analytical methods to terrestrial or geographic datasets, including the use of geographic information systems and geomatics
This notebook covers the following exciting features:
A list of packages and libraries is imported :
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Urban Population and Total Population CSV files are imported with the help of pandas ".read_csv" function to read and stored the file into a pandas dataframe.
df_urban = pd.read_csv('API_SP.URB.TOTL_DS2_en_csv_v2_1868968.csv', skiprows=4)
df_urban.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | Urban population | SP.URB.TOTL | 27526.0 | 28141.0 | 28532.0 | 28761.0 | 28924.0 | 29082.0 | ... | 43819.0 | 44057.0 | 44348.0 | 44665.0 | 44979.0 | 45296.0 | 45616.0 | 45948.0 | 46295.0 | NaN |
| 1 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7212518.0 | 7528588.0 | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | NaN |
| 2 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 14660282.0 | 15383127.0 | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | NaN |
| 3 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1546929.0 | 1575788.0 | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | NaN |
| 4 | Andorra | AND | Urban population | SP.URB.TOTL | 7839.0 | 8766.0 | 9754.0 | 10811.0 | 11915.0 | 13067.0 | ... | 74305.0 | 73056.0 | 71515.0 | 70057.0 | 68919.0 | 68213.0 | 67876.0 | 67813.0 | 67873.0 | NaN |
5 rows × 65 columns
df_total = pd.read_csv('API_SP.POP.TOTL_DS2_en_csv_v2_1637443.csv', skiprows=4)
df_total.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | Population, total | SP.POP.TOTL | 54211.0 | 55438.0 | 56225.0 | 56695.0 | 57032.0 | 57360.0 | ... | 102046.0 | 102560.0 | 103159.0 | 103774.0 | 104341.0 | 104872.0 | 105366.0 | 105845.0 | 106314.0 | NaN |
| 1 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 30117413.0 | 31161376.0 | 32269589.0 | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | NaN |
| 2 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 24220661.0 | 25107931.0 | 26015780.0 | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | NaN |
| 3 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2905195.0 | 2900401.0 | 2895092.0 | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | NaN |
| 4 | Andorra | AND | Population, total | SP.POP.TOTL | 13411.0 | 14375.0 | 15370.0 | 16412.0 | 17469.0 | 18549.0 | ... | 83747.0 | 82427.0 | 80774.0 | 79213.0 | 78011.0 | 77297.0 | 77001.0 | 77006.0 | 77142.0 | NaN |
5 rows × 65 columns
df_urban.shape
(264, 65)
The ".shape" function of pandas gives the number of rows and columns of the dataframe. The Urban population data frame has initially 264 rows and 65 columns
The ".isnull()" function tells the null values in the datasets
sns.heatmap(df_urban.isnull(),yticklabels=False,cbar=False,cmap='viridis')
print('We have {} NaN/Null values in Dataset'.format(df_urban.isnull().values.
sum()))
We have 553 NaN/Null values in Dataset
The yellow lines in the above heatmap indicates the null values in the dataset. We got 553 null values in the dataset which we have to clean further with the help of ".dropna" function.
The ".dropna" function drops the particular rows and columns which have null values by setting "axis=1" for columns and "axis=0" for rows.
df_urban=df_urban.dropna(how='all', axis=1)
df_urban = df_urban.dropna(axis=0)
sns.heatmap(df_urban.isnull(),yticklabels=False,cbar=False,cmap='viridis')
print('We have {} NaN/Null values in Dataset'.format(df_urban.isnull().values.
sum()))
We have 0 NaN/Null values in Dataset
After using the ".dropna" function we have now zero null values in our urban population dataset.
df_urban.shape
(256, 64)
After cleaning, the rows and the columns has changed to 256 and 64 respectively.
Checking the shape of Total Population Dataset
df_total.shape
(264, 65)
sns.heatmap(df_total.isnull(),yticklabels=False,cbar=False,cmap='viridis')
print('We have {} NaN/Null values in Dataset'.format(df_total.isnull().values.
sum()))
We have 433 NaN/Null values in Dataset
We got 443 null values in the dataset which we have to clean further with the help of ".dropna" function.
df_total=df_total.dropna(how='all', axis=1)
df_total = df_total.dropna(axis=0)
sns.heatmap(df_total.isnull(),yticklabels=False,cbar=False,cmap='viridis')
print('We have {} NaN/Null values in Dataset'.format(df_total.isnull().values.
sum()))
We have 0 NaN/Null values in Dataset
After using the ".dropna" function we have now zero null values in our total population dataset.
df_total.shape
(258, 64)
After cleaning, the rows and the columns has changed to 258 and 64 respectively.
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()
| pop_est | continent | name | iso_a3 | gdp_md_est | geometry | |
|---|---|---|---|---|---|---|
| 0 | 920938 | Oceania | Fiji | FJI | 8374.0 | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | 53950935 | Africa | Tanzania | TZA | 150600.0 | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | 603253 | Africa | W. Sahara | ESH | 906.5 | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | 35623680 | North America | Canada | CAN | 1674000.0 | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | 326625791 | North America | United States of America | USA | 18560000.0 | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
Selecting only "iso_a3" and "geometry" columns from world inbuilt geopandas dataset as only these columns will be used further
world_gdf= world[['iso_a3','geometry']]
world_gdf.head()
| iso_a3 | geometry | |
|---|---|---|
| 0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | ESH | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
The name of the column "iso_a3" in the world dataset has changed to "Country Code" as this column helps in merging the world dataset with both the urban population dataset and total population dataset.
world_gdf = world_gdf.rename(columns = {'iso_a3': 'Country Code'}, inplace = False)
world_gdf.head()
| Country Code | geometry | |
|---|---|---|
| 0 | FJI | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| 1 | TZA | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| 2 | ESH | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| 3 | CAN | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| 4 | USA | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
The urban popluation dataframe is merged with the world geodataframe with the help of pandas ".merge" function.
df_urban_merged = df_urban.merge(world_gdf, on='Country Code', how='left')
df_urban_merged.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | Urban population | SP.URB.TOTL | 27526.0 | 28141.0 | 28532.0 | 28761.0 | 28924.0 | 29082.0 | ... | 43819.0 | 44057.0 | 44348.0 | 44665.0 | 44979.0 | 45296.0 | 45616.0 | 45948.0 | 46295.0 | None |
| 1 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7212518.0 | 7528588.0 | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 14660282.0 | 15383127.0 | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1546929.0 | 1575788.0 | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 4 | Andorra | AND | Urban population | SP.URB.TOTL | 7839.0 | 8766.0 | 9754.0 | 10811.0 | 11915.0 | 13067.0 | ... | 74305.0 | 73056.0 | 71515.0 | 70057.0 | 68919.0 | 68213.0 | 67876.0 | 67813.0 | 67873.0 | None |
5 rows × 65 columns
The total popluation dataframe is merged with the world geodataframe with the help of pandas ".merge" function.
df_total_merged = df_total.merge(world_gdf, on='Country Code', how='left')
df_total_merged.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | Population, total | SP.POP.TOTL | 54211.0 | 55438.0 | 56225.0 | 56695.0 | 57032.0 | 57360.0 | ... | 102046.0 | 102560.0 | 103159.0 | 103774.0 | 104341.0 | 104872.0 | 105366.0 | 105845.0 | 106314.0 | None |
| 1 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 30117413.0 | 31161376.0 | 32269589.0 | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 24220661.0 | 25107931.0 | 26015780.0 | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2905195.0 | 2900401.0 | 2895092.0 | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 4 | Andorra | AND | Population, total | SP.POP.TOTL | 13411.0 | 14375.0 | 15370.0 | 16412.0 | 17469.0 | 18549.0 | ... | 83747.0 | 82427.0 | 80774.0 | 79213.0 | 78011.0 | 77297.0 | 77001.0 | 77006.0 | 77142.0 | None |
5 rows × 65 columns
df_urban_merged=df_urban_merged.dropna(how='any', axis=0)
df_urban_merged.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7212518.0 | 7528588.0 | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 14660282.0 | 15383127.0 | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1546929.0 | 1575788.0 | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 6 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7553138.0 | 7747411.0 | 7824294.0 | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | POLYGON ((51.57952 24.24550, 51.75744 24.29407... |
| 7 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 37543830.0 | 38027774.0 | 38509756.0 | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... |
5 rows × 65 columns
df_total_merged=df_total_merged.dropna(how='any', axis=0)
df_total_merged.head()
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 30117413.0 | 31161376.0 | 32269589.0 | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 24220661.0 | 25107931.0 | 26015780.0 | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2905195.0 | 2900401.0 | 2895092.0 | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 6 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 8946777.0 | 9141596.0 | 9197910.0 | 9214175.0 | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | POLYGON ((51.57952 24.24550, 51.75744 24.29407... |
| 7 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 41261490.0 | 41733271.0 | 42202935.0 | 42669500.0 | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... |
5 rows × 65 columns
type(df_urban_merged)
pandas.core.frame.DataFrame
type(df_total_merged)
pandas.core.frame.DataFrame
Urban population pandas dataframe is changed to geopandas dataframe with the help of "gpd.geodataframe" function and the geometry column in the dataframe.
gdf_urban = gpd.GeoDataFrame(df_urban_merged, geometry = df_urban_merged.geometry, crs={'init': 'epsg:4326'})
gdf_urban.head()
C:\Users\rohit\Anaconda3\lib\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7212518.0 | 7528588.0 | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 14660282.0 | 15383127.0 | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1546929.0 | 1575788.0 | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 6 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7553138.0 | 7747411.0 | 7824294.0 | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | POLYGON ((51.57952 24.24550, 51.75744 24.29407... |
| 7 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 37543830.0 | 38027774.0 | 38509756.0 | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... |
5 rows × 65 columns
Total population pandas dataframe is changed to geopandas dataframe with the help of "gpd.geodataframe" function and the geometry column in the dataframe.
gdf_total = gpd.GeoDataFrame(df_total_merged, geometry = df_total_merged.geometry, crs={'init': 'epsg:4326'})
gdf_total.head()
C:\Users\rohit\Anaconda3\lib\site-packages\pyproj\crs\crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 30117413.0 | 31161376.0 | 32269589.0 | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | POLYGON ((66.51861 37.36278, 67.07578 37.35614... |
| 2 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 24220661.0 | 25107931.0 | 26015780.0 | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | MULTIPOLYGON (((12.99552 -4.78110, 12.63161 -4... |
| 3 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2905195.0 | 2900401.0 | 2895092.0 | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | POLYGON ((21.02004 40.84273, 20.99999 40.58000... |
| 6 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 8946777.0 | 9141596.0 | 9197910.0 | 9214175.0 | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | POLYGON ((51.57952 24.24550, 51.75744 24.29407... |
| 7 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 41261490.0 | 41733271.0 | 42202935.0 | 42669500.0 | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | MULTIPOLYGON (((-68.63401 -52.63637, -68.25000... |
5 rows × 65 columns
type(gdf_urban)
geopandas.geodataframe.GeoDataFrame
type(gdf_total)
geopandas.geodataframe.GeoDataFrame
gdf_urban.to_file("urban_population.shp")
<ipython-input-27-88df5f7e82de>:1: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile.
gdf_urban.to_file("urban_population.shp")
gdf_total.to_file("total_population.shp")
<ipython-input-28-21146846c4c3>:1: UserWarning: Column names longer than 10 characters will be truncated when saved to ESRI Shapefile.
gdf_total.to_file("total_population.shp")
Spatial data file of urban population and total population are readed with geopandas using ".read_file" function.
gdf_urban_shp = gpd.read_file('urban_population.shp')
gdf_total_shp = gpd.read_file('total_population.shp')
The Coordinate Reference System or CRS of a spatial object tells Python where the raster is located in geographic space. It also tells Python what mathematical method should be used to “flatten” or project the raster in geographic space.
gdf_urban_shp.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
gdf_urban_shp.plot()
<AxesSubplot:>
gdf_urban_shp = gdf_urban_shp.to_crs(epsg = 3395)
gdf_urban_shp.plot()
<AxesSubplot:>
gdf_total_shp.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
gdf_total_shp.plot()
<AxesSubplot:>
gdf_total_shp = gdf_total_shp.to_crs(epsg = 3395)
gdf_total_shp.plot()
<AxesSubplot:>
We can calculate urban population per capita by dividing urban population by total population.
gdf_urban_shp['UPC_1990'] = gdf_urban_shp['1990'] / gdf_total_shp['1990']
gdf_urban_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | UPC_1990 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7528588.0 | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 0.21177 |
| 1 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 15383127.0 | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 0.37144 |
| 2 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1575788.0 | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 0.36428 |
| 3 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7747411.0 | 7824294.0 | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 0.79051 |
| 4 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 38027774.0 | 38509756.0 | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 0.86984 |
5 rows × 66 columns
The world urban population per capita for the year "1990" is saved in the dataframe with the column named as "UPC_1990".
gdf_urban_shp['UPC_2000'] = gdf_urban_shp['2000'] / gdf_total_shp['2000']
gdf_urban_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | UPC_1990 | UPC_2000 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 7865067.0 | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 0.21177 | 0.22078 |
| 1 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 16130304.0 | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 0.37144 | 0.50087 |
| 2 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1603505.0 | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 0.36428 | 0.41741 |
| 3 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7824294.0 | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 0.79051 | 0.80236 |
| 4 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 38509756.0 | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 0.86984 | 0.89142 |
5 rows × 67 columns
The world urban population per capita for the year "2000" is saved in the dataframe with the column named as "UPC_2000".
gdf_urban_shp ['UPC_2010'] = gdf_urban_shp['2010'] / gdf_total_shp['2010']
gdf_urban_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | UPC_1990 | UPC_2000 | UPC_2010 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Urban population | SP.URB.TOTL | 755836.0 | 796272.0 | 839385.0 | 885228.0 | 934135.0 | 986074.0 | ... | 8204877.0 | 8535606.0 | 8852859.0 | 9164841.0 | 9477100.0 | 9797273.0 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 0.21177 | 0.22078 | 0.23737 |
| 1 | Angola | AGO | Urban population | SP.URB.TOTL | 569222.0 | 597288.0 | 628381.0 | 660180.0 | 691532.0 | 721552.0 | ... | 16900847.0 | 17691524.0 | 18502165.0 | 19332881.0 | 20184707.0 | 21061025.0 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 0.37144 | 0.50087 | 0.59783 |
| 2 | Albania | ALB | Urban population | SP.URB.TOTL | 493982.0 | 513592.0 | 530766.0 | 547928.0 | 565248.0 | 582374.0 | ... | 1630119.0 | 1654503.0 | 1680247.0 | 1706345.0 | 1728969.0 | 1747593.0 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 0.36428 | 0.41741 | 0.52163 |
| 3 | United Arab Emirates | ARE | Urban population | SP.URB.TOTL | 67927.0 | 74975.0 | 84367.0 | 95215.0 | 106178.0 | 116473.0 | ... | 7866602.0 | 7935897.0 | 8047166.0 | 8182523.0 | 8332898.0 | 8479744.0 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 0.79051 | 0.80236 | 0.84087 |
| 4 | Argentina | ARG | Urban population | SP.URB.TOTL | 15076842.0 | 15449950.0 | 15815502.0 | 16183085.0 | 16552517.0 | 16923103.0 | ... | 38990109.0 | 39467043.0 | 39940546.0 | 40410674.0 | 40877099.0 | 41339571.0 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 0.86984 | 0.89142 | 0.90849 |
5 rows × 68 columns
The world urban population per capita for the year "2010" is saved in the dataframe with the column named as "UPC_2010".
gdf_total_shp['UPC_1990'] = gdf_urban_shp['1990'] / gdf_total_shp['1990']
gdf_total_shp['UPC_2000'] = gdf_urban_shp['2000'] / gdf_total_shp['2000']
gdf_total_shp['UPC_2010'] = gdf_urban_shp['2010'] / gdf_total_shp['2010']
gdf_total_shp.head()
| Country Na | Country Co | Indicator | Indicato_1 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | geometry | UPC_1990 | UPC_2000 | UPC_2010 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | Population, total | SP.POP.TOTL | 8996973.0 | 9169410.0 | 9351441.0 | 9543205.0 | 9744781.0 | 9956320.0 | ... | 33370794.0 | 34413603.0 | 35383128.0 | 36296400.0 | 37172386.0 | 38041754.0 | POLYGON ((7404817.438 4463862.784, 7466841.908... | 0.21177 | 0.22078 | 0.23737 |
| 1 | Angola | AGO | Population, total | SP.POP.TOTL | 5454933.0 | 5531472.0 | 5608539.0 | 5679458.0 | 5735044.0 | 5770570.0 | ... | 26941779.0 | 27884381.0 | 28842484.0 | 29816748.0 | 30809762.0 | 31825295.0 | MULTIPOLYGON (((1446654.358 -529289.854, 14061... | 0.37144 | 0.50087 | 0.59783 |
| 2 | Albania | ALB | Population, total | SP.POP.TOTL | 1608800.0 | 1659800.0 | 1711319.0 | 1762621.0 | 1814135.0 | 1864791.0 | ... | 2889104.0 | 2880703.0 | 2876101.0 | 2873457.0 | 2866376.0 | 2854191.0 | POLYGON ((2339940.185 4961221.199, 2337708.178... | 0.36428 | 0.41741 | 0.52163 |
| 3 | United Arab Emirates | ARE | Population, total | SP.POP.TOTL | 92418.0 | 100796.0 | 112118.0 | 125130.0 | 138039.0 | 149857.0 | ... | 9214175.0 | 9262900.0 | 9360980.0 | 9487203.0 | 9630959.0 | 9770529.0 | POLYGON ((5741805.754 2765811.385, 5761611.935... | 0.79051 | 0.80236 | 0.84087 |
| 4 | Argentina | ARG | Population, total | SP.POP.TOTL | 20481779.0 | 20817266.0 | 21153052.0 | 21488912.0 | 21824425.0 | 22159650.0 | ... | 42669500.0 | 43131966.0 | 43590368.0 | 44044811.0 | 44494502.0 | 44938712.0 | MULTIPOLYGON (((-7640303.070 -6882033.443, -75... | 0.86984 | 0.89142 | 0.90849 |
5 rows × 68 columns
gdf_urban_shp.to_file("urban_population.shp")
gdf_total_shp.to_file("total_population.shp")
For year 1990, we are going to plot Choropleth map representing the world urban population per capita using both matplotlib and plotly and then analyse which maps looks more interactive and gives more information than the other.
from mpl_toolkits.axes_grid1 import make_axes_locatable
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
ax.set_title('Urban Population Per Capita "1990"', fontdict={'fontsize': '25', 'fontweight' : '3'})
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
gdf_urban_shp.plot(column='UPC_1990', ax=ax, legend=True, cax=cax, legend_kwds={'label':"Urban Per Capita 1990"})
<AxesSubplot:title={'center':'Urban Population Per Capita "1990"'}>
The plotly Python library is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.
import plotly.express as px
f = px.choropleth(gdf_urban_shp,
locationmode = 'country names',
locations = gdf_urban_shp['Country Na'],
color = gdf_urban_shp['UPC_1990'],
color_continuous_scale="Viridis",
labels={'UPC_1990':'Urban Per Capita (1990)'},
projection="mercator")
f.update_layout(
title_text = 'URBAN POPULATION PER CAPITA : 1990',title_x=0.5,
width=900, height=700)
f.show()
After analysing both the above plots of year 1990, we can say that matplotlib gives a basic plot whereas plotly gives an more interactive and attractive plot than matplotlib. Only a few lines of codes are necessary to create aesthetically pleasing, interactive plots with plotly and also saves time when exploring the plot by a mouse hover function which gives the information of every country.
For all the next tasks we are going to use plotly to plot the choropleth maps.
f = px.choropleth(gdf_urban_shp,
locationmode = 'country names',
locations = gdf_urban_shp['Country Na'],
color = gdf_urban_shp['UPC_2000'],
color_continuous_scale="Inferno",
labels={'UPC_2000':'Urban Per Capita (2000)'},
projection="mercator")
f.update_layout(
title_text = 'URBAN POPULATION PER CAPITA : 2000',title_x=0.5,
width=900, height=700)
f.show()
f = px.choropleth(gdf_urban_shp,
locationmode = 'country names',
locations = gdf_urban_shp['Country Na'],
color = gdf_urban_shp['UPC_2010'],
labels={'UPC_2010':'Urban Per Capita (2010)'},
projection="mercator")
f.update_layout(
title_text = 'URBAN POPULATION PER CAPITA : 2010',title_x=0.5,
width=900, height=700)
f.show()